1 Introduction

Goal: Assess the quality of Illumina paired-end reads associated with a bacterial pathogen, before and after trimming

Input: Raw Illumina paired-end reads in gzipped fastq format (*.fastq.gz)

Output: Trimmed Illumina paired-end reads in gzipped fastq format (*.fastq.gz)

Programs Used:
    • FastQC: quality control for high throughput sequence data
    • Trimmomatic: adapter and low-quality sequence removal for Illumina reads

1.1 Learning Outcomes

  • Navigate in a Linux/Unix environment from the command line
  • Learn and practice using commands to accomplish simple tasks
  • Run software from the command line
  • Retrieve files from a remote workstation
  • Assess the quality of Illumina sequencing reads using FastQC
  • Remove Illumina adapter sequences and low-quality bases using Trimmomatic

1.2 Background

1.2.1 Some terminal-ology

  • computer hardware: physical elements of a computer

  • computer software: instructions which tell computer hardware how to perform a task

  • application software: a division of computer software; software which uses the computer system to perform functions beyond the basic operation of the computer itself

  • system software: a division of computer software; software used to manage computer hardware behavior and provide a platform for other software to run

Figure 1: Differences between hardware, system software, and application software (Neeraj Mishra)
Figure 1: Differences between hardware, system software, and application software (Neeraj Mishra)
  • operating system (OS): a division of system software; system software that manages computer hardware and software resources and provides common services for computer programs. Examples include Microsoft Windows and macOS.

  • kernel: component of an operating system; software at the core of a computer’s operating system which is used to mediate access to the computer’s resources

  • central processing unit (CPU): electronic circuitry within a computer which is responsible for running or executing programs; CPUs carry out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions

  • CPU core: the processing unit of a CPU

  • thread: thread of execution; an ordered sequence of instructions that tells the computer what to do

Figure 2: Components of a CPU (Daniel López Azaña)
Figure 2: Components of a CPU (Daniel López Azaña)
  • random-access memory (RAM): a form of computer data storage which stores data and machine code that are currently being used

  • Unix shell: also known as a shell; a command-line interpreter that provides a command line user interface for Unix-like operating systems

  • terminal emulator: also known as a terminal; a text-based interface for typing commands, allowing users to interact with a Unix shell

  • graphical user interface (GUI): a form of user interface that allows users to interact with electronic devices through graphical icons and visual indicators

Figure 3: the Galaxy GUI, which provides a GUI for many bioinformatics tools (Wikipedia)
Figure 3: the Galaxy GUI, which provides a GUI for many bioinformatics tools (Wikipedia)
  • command-line interface (CLI): a means of interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines)

1.2.2 What are Linux/Unix?

Unix is a family of operating systems that descend from the original Unix, which was developed in 1969 by Ken Thompson, Dennis Ritchie, Douglas McIlroy, and Joe Ossanna at AT&T’s Bell Laboratories in the United States (Ritchie, 1984, AT&T Bell Laboratories Technical Journal). In 1984, Bell Labs began distributing Unix as a proprietary product which users were not allowed to modify. Unix operating systems with which you may already be familiar are Apple’s macOS and Darwin (a component of what is now known as Apple’s macOS).

In 1983, Richard Stallman began the GNU Project with the goal of creating a completely free Unix-compatible software system (in Stallman’s own words: “Once GNU is written, everyone will be able to obtain good system software free, just like air.”). By 1990, nearly all of the major components of a GNU operating system were finished; however, one important piece was still missing: the kernel, which is where Linus Torvalds comes in. In 1991, while attending the University of Helsinki, Torvalds created a kernel which would later be released as the Linux kernel in 1992. By combining Linux with the GNU system, a fully functioning operating system was born. GNU/Linux, or Linux, refers to this family of free, open-source, Unix-like operating systems based on the Linux kernel. Popular Linux distributions include Ubuntu, Debian, and Red Hat.

Figure 4: Tux the penguin, the Linux mascot (Wikipedia)
Figure 4: Tux the penguin, the Linux mascot (Wikipedia)

For this workshop, we will be doing nearly all of our computing in a Linux environment because

  1. It’s free!
  2. Bioinformaticians have and continue to rely heavily on Linux; a significant proportion of the software, tools, and piplines for analyzing next-generation sequencing (NGS) data are written specifically for Linux/Unix systems and can only be run from the Linux/Unix command line
  3. An increasing number of web tools and desktop applications with graphical user interfaces (GUIs) are available for analyzing bacterial NGS data. However, even if you never, ever use Linux again, you will have a better understanding of what is going on “under-the-hood” for many of these applications

1.2.3 What is Bash?

Bash (Bourne-Again shell) refers to a Unix shell (a command-line interpreter that provides a command line user interface for Unix-like operating systems) and command language. It was written by Brian Fox for the GNU Project, with a beta version released in 1989. Since then, it has become the most popular Linux shell, as it is the default interactive shell for users on most Linux and macOS operating systems (which is why we will be using it for this workshop!). Bash allows users to type commands in a specific language, which the computer then executes.

1.2.4 The Linux File System

Before we start, it’s important to understand the structure of the Linux file system so we can navigate it from the command line.

If you’re a Windows user, you may be familiar with a file system that looks something like this:

Figure 5: Windows GUI (Wikipedia)
Figure 5: Windows GUI (Wikipedia)

Mac users might understand their file systems as something that looks like this:

Figure 6: Mac GUI (O’Reilly)
Figure 6: Mac GUI (O’Reilly)

The Linux file system isn’t so different; some popular Linux distributions, such as Ubuntu, might even have a very nice GUI, like the one below:

Figure 7: Ubuntu GUI (Wikipedia)
Figure 7: Ubuntu GUI (Wikipedia)

However, when working from the command line, our view won’t be so pretty. But even without a GUI, we can imagine the underlying structure of the Linux file system as something resembling an upside-down tree, like the image below:

Figure 8: The Linux File System (AskUbuntu)
Figure 8: The Linux File System (AskUbuntu)

Files, such as text files, image files, fastq files, fasta files, etc. are inside of directories. Directories are just another name for what one might call a “folder” in the Windows or Mac worlds. In the tree above (Figure 8), directories have green boxes around them, while files don’t.

If you look at the top of the tree (Figure 8), you’ll see a directory called the root directory, denoted by a forward slash (/). You can think of the root directory as the trunk of a tree from which all branches (paths) start.

One important thing to note before we begin is the difference between absolute paths and relative paths, which can be a frustrating concept for Linux command-line beginners. Paths are used to specify a particular file or directory; the full path, or absolute path to a file or directory starts from the root, /, and follows the branches of the files system, with each directory separated by a slash (/), until the file or directory of interest is reached. For example, in the tree above (Figure 8), if our file of interest is fluffy.jpg, the absolute path to fluffy.jpg is /home/sue/Pictures/pets/fluffy.jpg (just follow the shaded green boxes!). A relative path, on the other hand, specifies the path to a directory or file relative to another; usually, this is the directory you are currently in, also known as the working directory. So, for example, in the picture above (Figure 8), if we are currently inside the home directory, the relative path to fluffy.jpg is sue/Pictures/pets/fluffy.jpg (see how we didn’t start with a forward slash? That’s because we didn’t start at the root directory). If we move to the Pictures directory, our relative path to fluffy.jpg is pets/fluffy.jpg. Finally, if we move to the pets directory, the relative path to fluffy.jpg is just fluffy.jpg! However, the absolute path to fluffy.jpg remains /home/sue/Pictures/pets/fluffy.jpg.

Figure 9: Actual image of fluffy.jpg? (Wikipedia)
Figure 9: Actual image of fluffy.jpg? (Wikipedia)

Also, note that paths, file names, and directories are case-sensitive. For example, in the image above, if you search for fluffy.jpg at the following path /home/sue/pictures/pets/fluffy.jpg, Bash will throw an error saying that your file doesn’t exist. Why? Because the actual path is /home/sue/Pictures/pets/fluffy.jpg, where the Pictures directory has a capital “P”. If you don’t type the path correctly, Bash will look for a directory named pictures within /home/sue/, which doesn’t exist.

Finally, note that you should try to avoid naming files and directories with special chacters, particularly whitespace, slashes, asterisks, and other special characters. You’ll notice that bioinformaticians love underscores and dashes, so consider using those instead of hitting the space key when you’re naming a file or directory!

2 Activity 3.1

2.2 Assessing the quality of raw Illumina reads with FastQC

  1. Now that we’re getting the hang of the command line, let’s analyze some genomic data! Let’s first move into our activity1 directory:
cd activity1

If we type pwd, we will see that we are now inside of activity1:

pwd

If we type ls, we will see that our current directory (activity1) is empty:

ls
  1. For this activity, we will use raw reads associated with a clinical Salmonella enterica isolate, which are stored on a remote server. This particular Salmonella enterica strain was isolated in 2017 from the blood of a 77-year old man in Switzerland with no known travel history. The isolate underwent whole-genome sequencing (WGS) using an Illumina MiSeq platform and a Nextera XT library preparation kit.

The isolate’s genome was sequenced using paired-end reads: forward reads are stored in a file called mystery_salmonella_A_1.fastq.gz, and reverse reads are stored in a file called mystery_salmonella_A_2.fastq.gz. Both of these files are in fastq format (.fastq), compressed using gzip (.gz; a common method for compressing genomic data).

Let’s download the forward reads from the remote server to our local computer using the wget command:

wget http://beagle.henlab.org/public/course_introtobioinfo/mystery_salmonella_A_1.fastq.gz

If we type ls, we should see our forward reads in our current directory:

ls

Now, let’s download the reverse reads:

wget http://beagle.henlab.org/public/course_introtobioinfo/mystery_salmonella_A_2.fastq.gz

If we type ls, we’ll see that we now have forward and reverse reads in our current directory:

ls
Figure 10: fastq format (USEARCH)
Figure 10: fastq format (USEARCH)
  1. Now let’s install and run FastQC, a popular program that is used to assess the quality of sequencing reads. It will provide us with a lot of interesting and valuable metrics which can help guide our downstream analyses (e.g., read length, number of reads, per-base quality, adapter content). Here, we will use FastQC to assess the quality of the raw reads of our mystery isolate.
Figure 11: Output of FastQC (Babraham Bioinformatics)
Figure 11: Output of FastQC (Babraham Bioinformatics)

To install FastQC, let’s create a conda environment named fastqc by running the following:

conda create -n fastqc

Once that finishes, let’s activate our environment:

conda activate fastqc

Now, let’s install FastQC using conda, as described on the conda website:

conda install bioconda::fastqc

Once that finishes, we can check if FastQC is installed properly by running

fastqc --version

This will print the version of FastQC that is installed.

We can also run the following to print the FastQC help message (explains how to run FastQC, the different options and what they do, etc.):

fastqc -h
  1. Now let’s run FastQC on our raw Illumina reads!

The general format for running FastQC from the command line is as follows (note: don’t execute this command; it is just an example):

fastqc </path/to/first_reads.fastq.gz> </path/to/second_reads.fastq.gz> </path/to/third_reads.fastq.gz> </path/to/nth_reads.fastq.gz>

Let’s try running FastQC with our forward and reverse reads. Because they are already in our current directory, we can just supply the names of the files:

fastqc mystery_salmonella_A_1.fastq.gz mystery_salmonella_A_2.fastq.gz

Once FastQC is finished, we can type ls and see that we now have four additional files: a .zip and a .html file for our two .fastq.gz files with our forward and reverse Illumina raw reads:

ls
  1. Deactivate your fastqc conda environment:
conda deactivate
  1. Open the FastQC html file for your forward and reverse reads (it should open in your browser); what sort of information can we gather about our sample?

2.3 Trimming raw Illumina reads with Trimmomatic

  1. Now that we have a general idea of what the quality of our raw reads is like, let’s see if we can do better by cleaning up our raw reads. To that end, we’ll use Trimmomatic to trim our Illumina reads (i.e., remove Illumina adapter sequences and low-quality bases from our Illumina reads). Trimming can often improve read quality and, later, assembly. There are numerous trimming programs available (e.g., Trim Galore/cutadapt, fastp, bbduk, just to name a few), but here we will use Trimmomatic; it is a popular, fast, and relatively easy-to-use trimming program that, importantly, takes into account read pairing information for paired-end reads (most modern trimming software, including the Trimmomatic alternatives listed, should do this, but it’s always good to double-check beforehand if you try out something new).

Here, we will use Trimmomatic to:

A. Trim any Illumina adapters that are left over from the Nextera library preparation

B. Use a sliding window approach to measure average quality within a 4-base window, trimming when the average per-base quality drops below a Phred quality score of 15

C. Remove trailing bases (i.e., bases at the end of a read) which drop below a Phred quality score of 3

D. Remove leading bases (i.e., bases at the start of a read) which drop below a Phred quality score of 3

E. Remove reads less than 36 bp in length (note that employing a minimum read length filter like we are doing here can be especially important for analyses which require counting of mapped reads, such as RNA-Seq)

Let’s get trimming! Back in your terminal, make sure you are in your activity1 directory with the forward and reverse reads from our Mystery Salmonella A isolate (mystery_salmonella_A_1.fastq.gz and mystery_salmonella_A_2.fastq.gz):

pwd

should show that we are in our activity1 directory

ls

should show mystery_salmonella_A_1.fastq.gz and mystery_salmonella_A_2.fastq.gz

(If not: cd ~/activity1)

  1. Let’s install Trimmomatic using conda. First, let’s create a conda environment named trimmomatic:
conda create -n trimmomatic

Once our environment has been created, let’s activate it:

conda activate trimmomatic

Now, let’s install trimmomatic as described on the conda website:

conda install bioconda::trimmomatic

We can check that Trimmomatic has been installed properly by running the following:

trimmomatic -version

This command will print the version of Trimmomatic that has been installed.

  1. Let’s use Trimmomatic to trim our raw reads!

The command to run Trimmomatic for paired-end Illumina reads takes the following form (note: don’t execute this command; it is just an example):

trimmomatic PE -threads 1 -phred33 -trimlog log.txt -summary trimmomatic_summary.out input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:adapters-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

For this command, we are:

  • Calling Trimmomatic (trimmomatic)
  • Telling Trimmomatic that we have paired-end (PE) reads (PE)
  • Telling Trimmomatic how many threads we want to use (-threads 1)
  • Telling Trimmomatic that our Illumina reads have ASCII_BASE 33 quality scores and not the ASCII_BASE 64 quality scores used by very old Illumina platforms (-phred33)
  • Telling Trimmomatic to create a log file named log.txt (-trimlog log.txt)
  • Telling Trimmomatic to create a summary file named trimmomatic_summary.out (-summary trimmomatic_summary.out)
  • Giving Trimmomatic the path to our forward reads in fastq.gz format (input_forward.fq.gz)
  • Giving Trimmomatic the path to our reverse reads in fastq.gz format (input_reverse.fq.gz)
  • Telling Trimmomatic what we want to name the trimmed forward read pairs that it outputs (output_forward_paired.fq.gz)
  • Telling Trimmomatic what we want to name the trimmed forward unpaired reads that it outputs (output_forward_unpaired.fq.gz)
  • Telling Trimmomatic what we want to name the trimmed reverse read pairs that it outputs (output_reverse_paired.fq.gz)
  • Telling Trimmomatic what we want to name the trimmed reverse unpaired reads that it outputs (output_reverse_unpaired.fq.gz)
  • Telling Trimmomatic to remove any Illumina adapters, with the adapter sequences provided in the adapters-PE.fa file (ILLUMINACLIP:adapters-PE.fa:2:30:10)
  • Telling Trimmomatic to remove leading low quality or N bases below quality 3 (LEADING:3)
  • Telling Trimmomatic to remove trailing low quality or N bases below quality 3 (TRAILING:3)
  • Telling Trimmomatic to scan each read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
  • Telling Trimmomatic to drop reads less than 36 bases long (MINLEN:36)

Notice above that we need to tell Trimmomatic (i) which adapter sequences we want to trim, and (ii) where those adapter sequences are located on our computer. In our case, we know that the Illumina library preparation was carried out using a Nextera XT kit. Luckily, Trimmomatic has Nextera adapter sequences! To make it easier for you, you can simply download the Nextera adapter sequences from our remote server:

wget http://beagle.henlab.org/public/course_introtobioinfo/NexteraPE-PE.fa

If we type ls, we should see a file named NexteraPE-PE.fa in our current directory:

ls

Can you guess which type of file this is?

Alternatively, if you want a challenge: when you installed Trimmomatic using conda, the same exact Nextera adapter sequence file was downloaded to your local computer! Can you find where conda stored the adapter file? Try running the Trimmomatic command below using this file!

  1. Now that we have our adapter fasta file, let’s run Trimmomatic by typing in the following (very long!) command as follows, and then pressing “Enter”:
trimmomatic PE -threads 1 -phred33 -trimlog log.txt -summary trimmomatic_summary.out mystery_salmonella_A_1.fastq.gz mystery_salmonella_A_2.fastq.gz mystery_salmonella_A_trimmedP_1.fastq.gz mystery_salmonella_A_trimmedS_1.fastq.gz mystery_salmonella_A_trimmedP_2.fastq.gz mystery_salmonella_A_trimmedS_2.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

What are we doing when we run this?

  • Calling Trimmomatic (trimmomatic)
  • Telling Trimmomatic that we have paired-end (PE) reads (PE)
  • Telling Trimmomatic to use 1 thread (-threads 1)
  • Telling Trimmomatic that our Illumina reads have ASCII_BASE 33 quality scores and not the ASCII_BASE 64 quality scores used by very old Illumina platforms (-phred33)
  • Telling Trimmomatic to create a log file named log.txt (-trimlog log.txt)
  • Telling Trimmomatic to create a summary file named trimmomatic_summary.out (-summary trimmomatic_summary.out)
  • Giving Trimmomatic the path to our forward reads in fastq.gz format (mystery_salmonella_A_1.fastq.gz)
  • Giving Trimmomatic the path to our reverse reads in fastq.gz format (mystery_salmonella_A_2.fastq.gz)
  • Telling Trimmomatic what we want to name the trimmed forward read pairs that it outputs (we can technically name this file anything, but let’s call it mystery_salmonella_A_trimmedP_1.fastq.gz; the _1.fastq.gz suffix lets us know that the file contains forward reads, while trimmedP lets us know that these reads are trimmed read pairs)
  • Telling Trimmomatic what we want to name the trimmed forward unpaired reads that it outputs (similar to before, we can name this anything, but let’s call it mystery_salmonella_A_trimmedS_1.fastq.gz because they are trimmed forward SINGLE [unpaired] reads)
  • Telling Trimmomatic what we want to name the trimmed reverse read pairs that it outputs (to be consistent, mystery_salmonella_A_trimmedP_2.fastq.gz)
  • Telling Trimmomatic what we want to name the trimmed reverse unpaired reads that it outputs (to be consistent, mystery_salmonella_A_trimmedS_2.fastq.gz)
  • Telling Trimmomatic to remove Illumina Nextera paired-end adapters, with the adapter sequences provided in the NexteraPE-PE.fa file (ILLUMINACLIP:NexteraPE-PE.fa:2:30:10)
  • Telling Trimmomatic to remove leading low quality or N bases below quality 3 (LEADING:3)
  • Telling Trimmomatic to remove trailing low quality or N bases below quality 3 (TRAILING:3)
  • Telling Trimmomatic to scan each read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
  • Telling Trimmomatic to drop reads less than 36 bases long (MINLEN:36)
  1. Once Trimmomatic is finished, let’s take a look at the sizes of our fastq.gz files to get a rough idea of the effect trimming had on our genome. To do this, we can use ls with the -l and -h options to list our files and their respective sizes in human-readable format; we can additionally supply *.fastq.gz after the command, which allows us to only list files ending in .fastq.gz:
ls -lh *.fastq.gz
  1. Let’s deactivate our trimmomatic conda environment:
conda deactivate

2.4 Assessing the quality of trimmed Illumina reads with FastQC

  1. Now let’s run FastQC on our trimmed paired reads. First, let’s activate our fastqc conda environment:
conda activate fastqc
  1. Now, let’s run FastQC on our forward and reverse trimmed paired reads:
fastqc mystery_salmonella_A_trimmedP_1.fastq.gz mystery_salmonella_A_trimmedP_2.fastq.gz
  1. Once FastQC finishes, open the html files for your trimmed forward and reverse reads in your browser (i.e., mystery_salmonella_A_trimmedP_1_fastqc.html and mystery_salmonella_A_trimmedP_2_fastqc.html, respectively). Has the quality of your reads improved?

  2. Deactivate your conda environment:

conda deactivate

3 References

Babraham Bioinformatics. FastQC: a quality control tool for high throughput sequence data. Accessed April 14, 2019. URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Bolger, A.M., M. Lohse, and B. Usadel. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15): 2114-2120. DOI: 10.1093/bioinformatics/btu170.

JGI. BBDuk: Decontamination Using Kmers. Accessed April 15, 2019. URL: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/

Krueger, F. and Babraham Bioinformatics. Trim Galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries. Accessed April 15, 2019. URL: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/.

Martin, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17(1): 10-12. DOI: https://doi.org/10.14806/ej.17.1.200.

Ritchie, D.M. 1984. The UNIX system: The evolution of the UNIX time-sharing system. AT&T Bell Laboratories Technical Journal 63(8): 1577-1593. DOI: 10.1002/j.1538-7305.1984.tb00054.x.

Schubert, M., S. Lindgreen, and L. Orlando. 2016. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes 9: 88. DOI: 10.1186/s13104-016-1900-2.

4 Linux Cheat Sheet

Print the contents of a file:

cat <path/filename>

Move to a different directory:

cd <path>

Move up one directory:

cd ..

Move to the root directory:

cd /

Move to your home directory:

cd ~

Copy the source file to the destination:

cp <source> <destination>

Copy the source directory to the destination:

cp -r <source> <destination>

Download a file from an online source (wget):

wget filepath

Show disk usage:

df

Show directory space usage:

du

Find all instances of file in path with given name:

find <path> -name <pattern>

Show memory and swap usage in human-readable format:

free -h

Print all appearances of pattern in file:

grep <pattern> <path/filename>

Count all appearances of pattern in file:

grep -c <pattern> <path/filename>

Gzip file(s):

gzip path/filename

Decompress gzipped file(s):

gunzip path/filename.gz

Print the first 10 lines of a file:

head <path/filename>

Print system hostname:

hostname

List the given path:

ls <path>

List the given path, showing permissions, size, and modification dates in human readable format:

ls -lh <path>

List the given path, showing permissions, size, and modification dates in human readable format, sorting by filesize from largest to smallest:

ls -lhS <path>

List the given path, showing permissions, size, and modification dates in human readable format, sorting by filesize from smallest to largest:

ls -lhSr <path>

View the manual for a command (press q to exit manual):

man <command>

Create a directory:

mkdir <path/directory_name>

Move the source file or directory to the destination:

mv <source> <destination>

Print working directory (where am I?):

pwd

Remove a file:

rm <path/filename>

Remove an empty directory:

rm -d <path/directory>

Remove a non-empty directory (use with EXTREME caution!):

rm -r <path/directory>

Secure copy a file from remote server to the dir directory on your machine:

scp user@host:file dir

Secure copy a file from your machine to the dir directory on a remote server:

scp file user@host:dir

Secure copy the directory foo from remote server to the directory bar on your machine:

scp -r user@host:foo bar

Execute a shell script, script.sh:

sh script.sh

Connect to host as user:

ssh user@host

Print the last 10 lines of a file:

tail <path/filename>

Create a tar file, file.tar:

tar -cf file.tar path/directory

Extract a tar file, file.tar:

tar -xf file.tar

Create a gzipped tar file, file.tar.gz:

tar -czf file.tar.gz path/directory

Extract a gzipped tar file, file.tar.gz:

tar -xzf file.tar.gz

Print who you are logged in as:

whoami